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Abstract — Smart Grids measure energy usage in real-time and 
tailor supply and delivery accordingly, in order to improve 
power transmission and distribution. For the grids to operate 
effectively, it is critical to collect readings from massively- 
installed smart meters to control centers in an efficient and 
secure manner. In this paper, we propose a secure compressed 
reading scheme to address this critical issue. We observe that 
our collected real-world meter data express strong temporal 
correlations, indicating they are sparse in certain domains. We 
adopt Compressed Sensing technique to exploit this sparsity and 
design an efficient meter data transmission scheme. Our scheme 
achieves substantial efficiency offered by compressed sensing, 
without the need to know beforehand in which domain the meter 
data are sparse. This is in contrast to traditional compressed- 
sensing based scheme where such sparse-domain information is 
required a priori. We then design specific dependable scheme 
to work with our compressed sensing based data transmission 
scheme to make our meter reading reliable and secure. We 
provide performance guarantee for the correctness, efficiency, 
and security of our proposed scheme. Through analysis and 
simulations, we demonstrate the effectiveness of our schemes and 
compare their performance to prior arts. 

I. Introduction 

Smart Grids are playing a significant role in leading the 
global electrical grids revolution A recent trend is to 
deploy advanced information control and communication tech- 
niques in Smart Grids for better power transmission and 
distribution J2J- It has attracted significant attentions from 
government, industry, and academics ||3l . 

One promising Smart Grids architecture based on wireless 
has been proposed in HUD. Wireless broadband networks 
can serve more wide-area and mission-critical utility commu- 
nications due to its flexibility and high-speed transmission; 
service providers always choose them to improve reliability 
and resiliency during emergency scenarios. Thus, a hardened 
commercial wireless data network can serve as core part 
in building overall Smart Grids networks and exploit the 
advantage of elastic deployment with dynamic routing. 

To fully unleash such wireless-based Smart Grids' potential 
and fulfill their design objective, it is critical to collect data 
adaptively in a wireless manner from smart meters installed in 
millions of households with efficiency and dependability guar- 
antee J5] (6) Q . This critical problem has only been partially 
explored recently. We present a summary of related work on 
this topic in Section [II] 

In this paper, we provide an efficient and secure solution 
to collect measurements from all smart meters in wireless- 
based Smart Grids. Inspired by compressed sensing theory 



[|8l , substantial studies ||9l iflOl ifTTl have been explored to 
improve data transmission efficiency and it also gets applied in 
Smart Grids 0. We exploit the observed correlations among 
Smart Grids meter readings and adopt the compressed sensing 
technique to design an efficient transmission scheme. We also 
propose protection mechanism tailored for our compressed- 
sensing based reading transmission to achieve reliability and 
security. In particular, we make the following contributions: 

• From our collected real-world trace, we observe that 
the smart meter readings demonstrate strong temporal 
correlations. This indicates they are sparse in certain 
(unknown) domains. 

« We design an adaptive compressed-sensing based scheme 
to collect data. The scheme has two salient features. First, 
data collection works under arbitrary tree topologies. 
Second, it works without the need to know a priori in 
which domain the readings are sparse. This is in contrast 
to traditional compressed-sensing based scheme where 
such sparse domain information is required a priori. 
Performance guarantee of our scheme is also presented. 

• We design specific dependable scheme that can work 
with our compressed transmission to make it reliable and 
secure. Our dependable scheme considers physical link 
failures, outside and semi-honest inside attacks. 

« We carry out extensive numerical experiments using real- 
world smart meter readings to evaluate our scheme from 
transmission cost to reconstruction performance. 

The rest of our paper is organized as follows. We review 
related works in Section [TTJ Compressed reading and recon- 
struction scheme is discussed in Section[IV] Corresponding de- 
pendable mechanism in combination with transmission scheme 
is presented in Section [V] Experimental results and analysis 
are shown in Section El Finally, we conclude our work in 
Section EH 

II. Related Work 

Data collection problem in Wireless Sensor Networks 
(WSN) has received extensive studies ll9ll- lfTll . Data trans- 
mission in wireless-based Smart Grids shares many similar- 
ities with WSN, such as real-time transmission and dynamic 
routing; however, there 're still two primary differences. First, 
Smart Grids networks is only tree topology while WSN is 
arbitrary topology, specific topology allows tailored solution 
design to maximize the performance. For instance, our scheme 
utilizes the tree structure to specify node behaviors and reduce 
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transmission cost; secondly, meter readings express specific 
correlations that other WSN data might not express. We further 
utilize the strong temporal relationships among readings to 
improve transmission efficiency. 

Emerging problem of meter data collection in Smart Grids 
has attracted much attention recently. We summarize the 
difference between our work and existing work in Table U 
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Comparison with previous work for Reading Transmission in 
WSN and Smart Grids 
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Bartoli et al. | 7 | 


/ 




Individual Data 


/ 


Li and Liu | 5 | 
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Individual Data 
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A: Security; B: Transmission Efficiency; C: Granularity of Transmission; 
D: Stream Data Transmission; / means incorporated in work 



Bartoli et al. Q7J considered collecting individual meter 
readings independently and securely. The scheme has low 
transmission efficiency, since it does not explore correlation 
across data and every meter measurement is transmitted inde- 
pendently. Li et al. J6l further explored the correlation across 
meter readings and applied compressed sensing to improve 
efficiency; however, it devised centralized security mechanism 
based on wireless AP and lacked reliability. Li and Liu 
considered secure aggregate meter data collection with homo- 
morphic encryption. The scheme ensures secure transmissions 
but only collects aggregated readings, while individual reading 
collection is required under most scenarios. As comparison, 
our scheme explores correlation across meter readings and 
develop transmission solution based on compressed sensing. 
Moreover, reliability and security concerns are also critical in 
Smart Grids and have been considered [12] [13], we also devel- 
oped specific scheme to warranty security during transmission. 

In recent years, compressed sensing have been explored in 
both signal processing and data transmission communities due 
to its high efficiency and good recovery performance 10. In 
|9l , C.Luo et al. first considered efficient data transmission 
using compressed sensing in WSN and acts as the baseline 
scheme for our work. Then J.Luo et al. [ 1 1 1 [ 1 1 applied 
hybrid compressed sensing also in WSN data transmission 
and achieve improved efficiency while failed to consider the 
case under stream data over multiple time-slots. For reading 
transmissions without compressed sensing, they can be easily 
extended to stream data case; however, it's non-trivial for 
schemes using compressed sensing as Section [TV-DI explained. 
Inspired by this, we develop efficient compressed reading 
transmission for Smart Grids which can work under stream 
reading collection. Besides security guarantee, the differences 
between our scheme and [9|[11] are followings: first, our 
application scenario is Smart Grids; second, it achieves good 
efficiency with consideration of adaptively transmission for 
stream readings. 



TABLE II 
Notation List in Algorithms 



Symbols 


Notations 


G 


predefined transmission topology 


V 


Set of transmission participating nodes 


E 


Set of wireless links 


di(i) 


Meter reading at time t for i G V , eh (t) G K+ 


$ 


Sensing Matrix 




Wavelet Transform 


H 


Switching matrix 


h[X] 


hash function for message X 


(X)k 


encrypt message M with key X 


Ki, Kr L 


public /private key LHU i 


K, K 1 


public/private key list for all LHUs 


K DC 


public key from data collector 


T S 


current time-stamp 



III. Problem Setting 

In this section we present the general setting of our depend- 
able compressed reading scheme in Smart Grids networks and 
describe our chief goals. First, a list of key notations are given 
in Table ITU for both transmission and dependable mechanisms. 

For Smart Grids data transmission, we consider arbitrary 
tree topology as depicted in Fig. Q] where data collector is 
the root. We assume time is chopped into equal-length slots 
and represent Smart Grids networks as G = (V, E). V is the 
set of nodes in tree. Every node, except the root, represents 
a smart meter. For every node i G V , we use di(t) £ ~M. + 
to represent its meter reading at time t. Nodes coordinate to 
transmit readings from all meters, i.e., {di(t)} t for all i E V, 
to the root which represents the Data Collector. E is the set of 
wireless links where one node can directly communicate with 
its parent. We assume that all links can transmit one message in 
every slot simultaneously (through locally orthogonal channels 
or properly imposed scheduler). 
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Fig. 1. Smart Grids Data Transmission Topology 

More specifically, we regard direction towards data collector 
as upstream and consider three types of nodes in Smart Grids: 

• Forwarder: Legitimate Home Users (LHUs) that reside 
mostly on downstream of the tree. A Forwarder first 
acquires reading from attached smart-meter, then for- 
wards to its parent node together with the data from its 
downstream children nodes. 

• Aggregator: Legitimate Home Users (LHUs) that usually 
reside in the middle of the tree. An Aggregator collects 
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reading from its smart-meter, aggregates it with recited 
data from downstream children, and sends aggregated 
data to parent. Further explanations in Section IIV-CI 
• Data Collector: the root node of the tree. It receives all 
the readings sent from Aggregators and Forwarders, and 
reconstructs original readings of all smart meters. 

Detailed definitions for Forwarder and Aggregator will be 
specified in Section IIV-CI where we introduce our compressed 
reading scheme in AlgorithmT] 

For dependability-related issues, first we try to ensure relia- 
bility with consideration of link failure; then we assume Smart 
Grids data transmission being exposed to outside attackers 
who can destroy transmitted readings or impersonate LHUs, 
they are further explored in Section [V] Smart Grids' LHUs 
are regarded as semi-honest who would passively eavesdrop 
readings instead of actively tampering. 

There are two objectives in designing a dependable com- 
pressed reading scheme. The first objective is to use as few to- 
tal number of transmissions to allow all smart meters' reading 
reconstruction up to acceptable bounded errors, i.e., {di(t)} t 
for all i € V at Data Collector. The second objective is to 
make the transmissions dependable, by designing mechanism 
tailored for the compressed transmission, we generate trouble- 
free transmission topology and defend against outsiders, semi- 
honest insiders. 

IV. Compressed Reading and Reconstruction 

A. Compressed Sensing Preliminary 

We first briefly review the necessary compressed sens- 
ing background. In general, one needs N measurements to 
fully recover an iV-dimensional signal. However, for an TV- 
dimensional signal is sparse in certain domain, the compressed 
sensing theory [8| states a rather surprising result: one can 
fully recover the signal by using only SI (log linear mea- 
surements. 

DEFINITION 1. An N -dimensional signal d is said to be It- 
sparse in a domain '5, if there exists an N -dimensional vector 
x so that d = ^x and x has at most K non-zero entries 
(K < N). 

Remarks: (i) the above definition covers the case where d is 
sparse itself, for which we can simply take "J = I and x = d. 
(ii) many natural signals are sparse in certain domain. For 
example, natural images are sparse in Wavelet domain [14|. 
We observe meter readings are sparse in frequency domain. 

Let y be an Af-dimensional linear measurements of d, i.e., 
y = $d, where $ is an M x N sensing matrix. 

DEFINITION 2. An M x N sensing matrix $ is said to satisfy 
a Restricted Isometry Property (RIP) of sparsity K (K < N) 
if there exists a 8k € (0, 1) such that the following holds for 
any K-sparse N -dimensional vector z: 

(l-MI*ll/ a <ll**llf a <(l + ^)ll*ll? a . 
where represents the 1% norm. 



It has been shown in 03] OH that anMxJV matrix $ 
satisfies RIP with probability 1 — 0(e~ lN ) for some 7 > if 

• all its entries are independent and identically distributed 
Gaussian random variables with mean zero and variance 
l/M, 

, and M > const ■ K ■ log N/K. 
The following observation is due to JH: 

THEOREM 1. Consider an under- determined linear system 
y = $d where y is an M x 1 vector and d is a N x 1 vector 
and is K-sparse in domain ^. If the M X N sensing matrix $ 
satisfies the RIP property and $ is orthornomal, then d can be 
recovered exactly by solving the following convex optimization 
problem 

min x \\x\\ h (1) 
s. t. y = <&d 
d = ^x. 

B. Challenges and Solution Overview 

Since the energy consumption habit is lasting, which indi- 
cates that the meter readings have strong temporal correlation 
(detailed in Section[Vl]), thus meter readings in Smart Grids are 
sparse in frequency domain. It is natural to explore compressed 
sensing to reduce the number of transmissions for the data 
collector to collect all the meter readings. 

However, it is nontrivial to apply compressed sensing in our 
problem. In particular, there are three challenges stand: 
« How to generate sensing matrix $ which satisfies RIP 
in Smart Grids networks and data collector can recover 
the original data using the same $ without receiving it 
directly; 

« How to select sparse domain W to represent the meter 
readings sparsely since the readings are not sparse natu- 
rally and compressed sensing deals with the signal which 
is compressible in some domain; 

• How to adjust 'f adaptively to deal with a stream of data 
since we need to recover a stream of data which is not 
sparse in a fixed domain. 

We address these three challenges and successfully design an 
efficient meter reading solution based on compressed sensing. 
We address the first challenge by using the pseudo-random 
number generator seeded with the node's identity. For the 
second and the third challenges, we design transform domain 
by using the property of piecewise polynomial signal. We 
elaborate our solutions in the next two subsections. 

C. Compressed Reading 

In this section, we will give a transmission scheme to 
address first challenge to apply compressed sensing to Smart 
Grids networks. 

Given Smart Grids networks G{V,E), \V\ = N. Denote S 
is the Data Collector. M is compressed factor defined by data 
collector and data collector broadcasts it to the entire network. 
We define forwarder as the node whose number of children 
nodes is less than or equal to M — 1 and aggregator as the 
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node whose number of children nodes is larger than M— 1. For 
each node i G V, IDi is its identity. Q is the set of children 
nodes of node i, Fi and Aj is the set of forwarder children 
nodes and set of aggregator children nodes of it respectively. 
rrii is the message it sends out. 

We make some assumptions that are considered to be 
reasonable in our scheme: 

« Every LHU reports its data synchronously. Also, each 
LHU should append its ID within transmitted message. 

• Data Collector receives measurements periodically from 
all registered LHUs; it then recovers the data and estimate 
the grid state at that moment. 

In AlgorithmT] if there is no aggregator in the network, 
then each forwarder just relays the readings from its children 
nodes including its own reading to the parent node on the 
tree. If not, the forwarders just work as the same way as the 
previous case while each aggregator will generate weighted 
sum of the readings from its forwarder children nodes and 
combine with the message from its aggregator children nodes, 
then reports to its parent node. Finally, the data collector will 
get the measurements of all data readings for the purpose of 
reconstruction of original data. It's clear that each node reports 
at most M messages in the transmission scheme. And total 
cost is even less than the scheme in (9). Due to applying 
compressed sensing to the data transmission in Smart Grids, 
the bottleneck and total cost of Smart Grids networks will be 
significantly reduced. Detailed discussion in Section llV-FI 



Algorithm 1: Compressed Reading Algorithm 
Input: [ G(V, E), S, ID~, d z , Vi G V, M ] 
Output: [ rrii, Vi G V] 
begin 

Count \d\, Vi G V 
Generate A l and F t , Vi G V 
if | Ujev Aj\ =0 then 
|_ Vi G V, rrii = Ujdj, j G Q U i 

else 

for I = 1 to M do 

if i is forwarder then 
I rrii = Ujdj, j G C'iU i 

else i generates gaussian random coefficients 0y 
using a pseudo-random number generator for all 
j G -Fi U i seeded with associated IDj 

">.=Y.j. i + T,jeAi '"j 

S collects the weight measurement of all nodes 

represented as yi = Y^,jLi fyjdj 

S can get the mathematic formulas y = <£>e? to 

_ demonstrate the all measurements 



D. Compressed Reconstruction 

In this section, we will propose data reconstruction scheme 
based on compressed sensing for Data Collector to recover 
original data after collecting all M measurements using the 



transmission scheme. When the network topology is known 
by Data Collector, it can generate the same sensing matrix 
$ using the same pseudo-random number generator and the 
same ID. However, the prerequisite of exact reconstruction 
of using compressed sensing technology is that the signal is 
sparse or sparse in certain domain. We propose the scheme 
below to address this challenge. The objective of our scheme 
is to reconstruct the meter readings of all smart meters di(t), 
Vi G V by Data Collector at time t. Let's first look into the 
simple case where Data Collector need to recover the static 
data. After that, we will address how to deal with the stream 
data. 

1) Snapshot Case: In this case, We propose the algorithm 
for Data Collector to reconstruct the static data d G K.^. 

It is well known that piecewise polynomial signal is sparse 
in wavelet domain |14|. 

PROPOSITION 1. HI 71/ If a signal f is equal to a polynomial 
of degree less than J/2 over the support of a k-level Daub] 
wavelet W^ v then the k-level coefficient x m — f ■ W^ n is zero. 

Let H be the switching matrix such that the entries of Hd 
is ascendingly ordered which can be regarded as piecewise 
polynomial. 

THEOREM 2. Given y = $d where y is a M x 1 vector and 
d is K -sparse in domain H~ l ^>. If the entries of M x N 
sensing matrix $ are independent and identically distributed 
Gaussian random variables with mean zero and variance 1 /M 
and $ is wavelet domain, then Data Collector can recover d 
exactly by solving the following convex optimization problem 
with probability 1 — 0{e~ lN ) for some 7 > if M > const ■ 
K ■ log N/K 

min x \\x\\ h (2) 
s. t. y = $<i 

d = H- l ^>x. 

2) Stream Case: We have shown that how to reconstruct the 
static data using the snapshot algorithm. However, for snapshot 
algorithm, we assume to know the H, the switching matrix, 
in advance but it is usually not known a priori in practice. 
Further, instead of a snapshot, we need to recover a stream 
of data {di(t)} t for all i G V. We address these challenges 
by designing an algorithm exploiting the temporal correlation 
and our particular compressed sensing scheme as follows. 

> to: Each node is treated as forwarder and reports its data, 
so the data collector can get the exact data d(to) G Mr. 
Data collector sorts the data as ascending order using the 
switching matrix H(to). 

• ti, i=l,2,„. : Use compressed reading scheme to get 
weighted measurements y(U) = &d(ti). From snapshot 
case analysis, we know d(ti) is if -sparse in ii~ 1 (ii)'i/> 
and can be reconstructed by solving an li minimization 
problem. The challenge is that we don't know iJ^ 1 (tj) 
before we reconstruct d(ti). Thus, data collector takes a 
bold approach to construct a d(ti) by solving following 
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li minimization problem where the unknown H 1 (ti) is 
replaced by the available one 



min x \\x\\h 
s. t. y(U) = <5>d(U) 

d(U) = H^iU^x. 



(3) 



The data collector sorts the estimated data d(ti) as 
ascending order to get the estimated switching matrix 
H(ti). The system then proceed to the i + 1 th round. 
The above algorithm shares a similar flavor as the 
differential-coding based video coding schemes which encode 
the initial frame and then only holds the changes from previous 
one [ 18]. The difference is that, in our algorithm, reconstruc- 
tion of the data in the current slot only depends on the order of 
the data in the previous slot; while in the differential coding, 
reconstruction of the former one depends on the values of the 
latter one. 

There are two issues to consider for the above algorithm. 
First, how accurate is d(ti) as compared to the real one d(ti)1 
Second, since data collector recovers e£j_|_i using the estimated 
switching matrix H(ti) where there has already existed error. 
Then, will the reconstruction error amplify into future rounds 
and deteriorate? 

In the next, we will bound the error and show that error 
will not amplify. 



THEOREM 3. Given specific 6 k satisfying: 

(l-5 K )\\z\\l<\\**z\\t<(l + 5 K )\\z 



2 

I J,) 



for any K -sparse N -dimensional vector z and definitions o/<£> 
and ^ in ([7}. H^ 1 is the perturbation matrix. ||^|| ; fe denotes 
the maximum I2 norm of matrix X's arbitrary k columns sub- 
matrices. 



1A 



lA 



||$^r||2fe 
II 11/2 



A < 



V2 



(1 + 1'a) 2 



- 1 



(4) 



C 



4^/1 + ^(1+7^) 



l-(V2 + l)[(l + ^.)(l+7^) 2 -l] 



Then, I2 norm of the difference between original readings, 
d, and the recovered readings to (O, d, is constrained as. 



\d-d\\i 2 < C/3 A j A \\y\\ h 



(5) 



The proof of this theorem is included in Appendix. 

Remarks: (i) Theorem 2 in |19| plays significant interme- 
diate step in proving our Theorem [3j the difference is that 
original theorem only bounds errors between sparse signals 
while ours can restrict errors of meter readings which are 
not originally sparse, (ii) at time-slot U, we can bound the 
error incurred with the estimated switching matrix by let 
H- 1 = H-^U^HiU), * = H-\U)^, d = d(U), 
y = y(ti) and d is the recovered readings to ||3); (iii) we 



can choose the "worst" H^ 1 to give the largest bound in (|5), 
thus error propagation can be ignored. In real Smart Grids data 
transmission, we get the observation that H^ 1 is far better than 
random perturbation, it only changes partly every two time- 
slots. It is supported by real data experiments in SectiorfVll 

E. Increment Analysis 

We have demonstrated how to quantify the propagated 
error under arbitrary interval increments. Here, we consider 
reasonable constraint against reading increment n(ti) from 
d(ti) to e£(ti_|_i) (i = 1, 2, ...), and propose another approach 
to bound the estimation error. We find that stronger error bound 
can be achieved if increment meets certain requirements. 

PROPOSITION 2. If n(U) is K-sparse in domain 
then we can reconstruct d(tj_|_i) exactly, i=0,l,..., 00. 

The proof of this proposition is included in Appendix. 

In real data, the increment may not be exact X-sparse in 
domain H^^i)^. Even though, we still can give the bound 
of reconstruction error when increment is approximately K- 
sparse. The definition of approximately sparse is given below. 

DEFINITION 3. The best K-sparse approximation dx of N- 
dimensional signal d is obtained by keeping the K largest 
entries of d and setting the others to zero. An N -dimensional 
signal d is said to be approximately K-sparse in a domain 
ty, if there exists an N -dimensional vector x so that d — ^x 
and 1 1 x — XxWh < s, e is a positive constant. 

PROPOSITION 3. // n(tj) is approximately K-sparse in 
domain i?~ 1 (ii)>I'. Then the estimation error — 
d(/;j_|_i)||i 2 < CqK~ x I 2 e, for i=0,l,.. .,00 and some constant 

The proof of this proposition is included in Appendix. 

Remarks: In real data, we can observe strong correlation 
between n(ti) and d(ti). Therefore, we can consider that 
n(ti) and d(ti) share the same sparse domain. 

F. Performance Analysis 

In this part, we analyze the performance of our transmission 
cost. First, the minimum and maximum transmission costs are 
given. 

THEOREM 4. Over any tree topology, for N LHUs using our 
transmission scheme, minimum transmission cost is N while 
maximum transmission cost is M{N — M/2 + 1/2). 

The proof of this theorem is included in Appendix. 
Then, we make transmission cost analysis against one 
special case: p-array complete tree. 

PROPOSITION 4. For N node p-ary complete tree transmis- 
sion topology, cost in our scheme is 0{N\og p M). 

The proof of this proposition is included in Appendix. 
While for the scheme in [9|, the total cost is always 0(N ■ 
M) and our scheme achieves better performance than that. 
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V. Secure Transmission Mechanism 

Previous work such asJ3]@|El|20)|2l]||22)||23] considered 
security in smart grids; however, they address security at data 
collector through either secrets from AP or estimation of grids 
state to check whether a discrepancy exists with the original, 
and they assumed that each LHU does not fail. In contrast, 
our scheme is a distributed solution where both aggregators 
and forwarders get involved instead of merely relying on data 
collector, and we consider reliability issues. 

A. Reliability 

To achieve reliable data transmission, we perform a diag- 
nostic test to settle physical errors, such as link failures and 
traffic congestion. Before data transmission, a data collector 
publishes temporary transmission topology G : each LHU has 
several outgoing links including one primary link and can 
communicate through them wirelessly. During the test, each 
LHU transmits test package with primary outgoing links to 
verify its effectiveness: once failed or suffered serious delays, 
it would broadcast link failure, enable another outgoing link as 
primary and continue testing. Iteratively perform this test till 
all LHUs' primary outgoing links work. Using each node's 
primary outgoing links, we generate a ready topology G, 
which will be used for reading data transmission. 

B. Security 

We protect data transmission from attacks launched by 
either outsiders or semi-honest insiders, and our security model 
is the following: 

1 ) Insider Adversary: Corrupted LHUs can work as ma- 
licious insider attackers. Following the standard assumptions 
in HQ, we assume that insider adversaries are semi-honest, 
namely they execute our algorithms properly, but they want to 
use received transmission data to infer other LHU's consump- 
tion behaviours. Also, insider adversaries can deny that they 
have sent a particular data to the collector. 

Inside adversaries do not drop or modify received packets, 
and data corruptions are only caused by outside adversaries. 

2) Outsider Adversary: we assume that an outsider adver- 
sary has a polynomial-bounded computational capacity and 
can actively launch the following attacks. 

• Data privacy: the attack can evade a LHU's privacy to 
infer their consumer behaviours. 

• Data Tampering: the attacker can tamper measurements 
along the link to alter or forge measurement and make 
data collector perform incorrect reconstruction; 

• Impersonation: the attacker can imitate a LHU, send 
forged data on its behalf to the collector; 

> Replay: the attacker can intercept previous transmissions 
and send them later in following days to cause recovery 
error; 

We defend attacks that can be launched by the above 
adversaries as follows. 

> Data Privacy: we encrypt transmitted data to hide con- 
sumption behaviors. Our transmission scheme requires 



that numerical calculations be performed on encrypted 
data. To achieve that, we employ Paillier Crypto-system 
l24l . which has the following homomorphic properties: 

E(m x + m 2 ) = E(mx) * E(m 2 ),E((f> ■ m) = E(m)' p 

E^nm + 4> 2 m 2 ) = E( mi )* * E{m 2 )^ (6) 

where E(mi) and E{m 2 ) are encrypted measurements, 
4>i and <fi 2 are random coefficients generated by a LHU 
with its ID. The Paillier scheme is a public-key crypto- 
system, and therefore if measurement readings are en- 
crypted with the data collector's public key, only will the 
data collector can decrypt and reconstruct the readings. 

> Data Integrity: to check whether the encrypted trans- 
missions get damaged, we use a cryptographic hash 
function to verify data integrity. Since measurement data 
in our setting is always 4 bytes long, we use SHA- 
1 hash function, which produces a 160-bit output, but 
we will only the first 64 bits of the hash value for our 
integrity verification. This will reduce data transmission 
cost, while maintaining reasonable security. 

> Impersonation: We use standard digital signatures to 
defend against impersonation attacks, and provide non- 
repudiation. Each LHU generates a unique RSA key pair: 
its private key is kept while public is published. Signature 
can be generated only by LHU with legal private key 
while can be verified by its public key held by others. 

« Message Freshness: We use time stamp to guarantee that 
each message transmitted is fresh, and thus defend against 
replay attacks. Specifically, a LHU uses its private RAS 
key to sign a hashed message together with its time stamp 
to ensure message freshness and integrity simultaneously. 

C. Integrating security with our Transmission Scheme 

Our secure transmission scheme includes 3 algorithms as 
follows. 

Algorithm^] (notations are explained in Table [II]) explains 
how to validate received packet: it fails if ID doesn't belong 
to certain set or if integrity verification fails after successful 
decryption. 



Algorithm 2: Validation 
Remark: verify the source ID and integrity of Pkt 
Data: [ K, Pkt, Set, T s ] 

K are public key list, Pkt with typical form, Set is ID 
set, Ts is fixed time-stamp 
Result: [ Ans, ID, EncData ] 
begin 

divide Pkt into E, id and Sig from typical form 
ID <- id ; EncData <- E 

VLID^Set or (Sig) K[ID] ^(h[EncData], T s ) then 

Ans = false 

else Ans = true 

return Ans, ID, EncData 
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Algorithm [3] is secure transmission scheme for Forwarders. 
Forwarder LHUi first maintains three sets Rec_P, V, and 
Rec_V respectively recording received packets, IDs of down- 
stream neighbors, and IDs of received packets. Then, he 
checks all the received packets using Validation: if passed, 
put this packet and its ID into Rec_P and Rec_V; otherwise, 
just abandon it. After verifications of all received packets, 
LHUi knows whether all downstream neighbors have sent 
their measurements; if not, ask LHUs in V\Rec_V for resend: 
perhaps suffering active tampering attacks. Finally, LHUi en- 
crypts his own measurement with Paillier, makes hash values, 
signs hash and time-stamp, then concatenates them to generate 
new packet. He transmits packets from Rec_P with its own 
packet directly to predefined parent node. 

Algorithm [4] is security strategy for Aggregator's trans- 
mission. Since aggregator LHUi needn't store and forward 
received packets, he only maintains sets V and Rec_V. How- 
ever, he should distinguish forwarders from aggregators in his 
downstream neighbors and maintain their IDs in set V/ for 
forwarders and V a for aggregators. Then he validates all the 
received packets and make compression for encrypted message 
from Paillier Crypto-system. For packets from non-aggregation 
transmission, LHUi use F to store compressed value from 
V/'s encrypted measurements as Eqn|6] the random coefficient 
are generated using randGen with LHU ID. For encrypted 
measurements from aggregators in V a , LHUi use A to store 
new compressed value through multiplication from Eqn|6] 
Then, it also reports missing IDs for resend. Finally, LHU i 
encrypts its own measurement, adds with A and F to generate 
new encrypted compressed reading, then produces new packet 
with LHU ID, hash and signature. Since LHUi is aggregator, 
he only transmits one packet to parent node. 



Algorithm 3: SecureTransmission_Forawrder 
Remark: Secure transmission for Forwarder LHU 
Data: [ i, K, K 1 , T s , G, K DC ] 
Result: [ Send Pkt ] 
begin 

1 Y <— IDs of LHU[i]'s downstream neighbors in G 

2 Rec_V <- <fi ; Rec_P <- <p 

3 for all received packets of LHU[i] do 

4 R = Validation (K, Current Pkt, V, T s ) 

5 if R.Ans = true then 

[_ add R.ID to Rec_V, CurrentPkt to Rec_P 

6 if V\Rec_V ^ <j) then 

7 request IDs in V\Rec_V for resend 

s E, «- (MA Kdc , New Pkt <- Ej | i | (h[E t ], T 8 ) K -x 
9 Send Pkt = Rec_Pkt U {New Pkt} 



VI. Evaluation 

A. Experiments for Data Transmission 

1 ) Settings: In transmission efficiency comparison, we gen- 
erate some arbitrary tree topologies where nodes are randomly 



Algorithm 4: SecureTransmission_Aggregator 



located with data collector at the center to simulate Smart 
Grids networks, and compare with other previous schemes. 
Here, we use Box-plot [25 1 to describe the statistical informa- 
tion of transmission cost. 

In performance evaluation, we use data from Stanford 
Powernet open project where readings are real-time energy 
consumption from household appliances [26]. We collected 
readings of 128 appliances every twenty minutes over six 
working days since advanced household smart meters can 
report measurements at a minimum interval of 15 minutes l27ll . 
After data preprocess of filtering invalid readings as negative 
or null, we get 396 groups of data for these 128 appliances. 
Each group includes data of one transmission round for 128 
nodes. For simplicity, we assign each node with ID from 1 to 
128. Numerical operations are under MATLAB environment. 

2) Transmission Efficiency: We first generate 20 arbitrary 
transmission networks as tree topology with 128 nodes and 
1024 nodes respectively. Then assign each tree with specific 
compressed factor M, ranging around 0.3A where N is the 
number of participated nodes; our companion work J9J has 
shown that M — Q.3N can achieve satisfactory recovery. 
Evaluation of transmission cost is performed over 20 different 
topologies given specific M each time. We choose Box-plot 
to represent the information from 20 trees under specific M 
and specific scheme. 

We compare our cost with baseline scheme from j£| and 
transmission without Compressed Sensing aggregation. As 
Fig |2(a)| and |2(c)| show, since baseline scheme makes each 
node's outgoing link carry exact M transmission packets, 
when M and N are given, transmission cost remained un- 
changed under different topologies and its box-plot is sin- 
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Remark: Secure transmission for Aggregator LHU 

Data: [ i, K, K \ t, G, K DC ] 

Result: [ Send Pkt ] 

Remarks 

begin 

V <— IDs of LHU[i]'s downstream neighbors in G 
V flY a IDs of forwarders/aggregators from G 
Rec_V <- 4>\ F <- 1 ; A <- 1 
for all received packets of LHU[i] do 

R = Validation (K, CurrentPkt, V, t) 
if RAns = true then 

Rec_V = Rec_V U {R.ID} 
if R.IDeV f then 
\_F = F*(R.EncData) randGen {RID) 

else A = A*(R.EncData) 

if V\Rec_V ^ (j> then 
|^ request IDs in V\Rec_V for resend 

New Data F * A * (randGen (i) *m[i])#- DC 
Send Pkt <- New Data | i | (h[New Data], t) K -i 
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Fig. 2. Overall Transmission Comparison of 128 nodes and 1024 nodes 
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(f) Performance of Round 330 
Fig. 3. Reconstruction Performance of Selected Transmission Rounds 



(g) Performance of Round 396 (h) L2 norm of Estimation Diifer- 
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gle value. Its overall transmission cost always exceeds our 
scheme upper bound. As for transmission without aggregation 
operations, we found it's more efficient than the referred 
baseline scheme. This result partly comes from the fact that 
tree topology spanning favors both width and depth instead 
of only length where topology more resembled the chain, and 
baseline scheme won more advantage. 

Then compare our transmission cost with that of non-C.S. 
scheme. Fig |2(b)| and Fig |2(d)| demonstrate the Box-plot of 
their results. For each M, our scheme reduce approximate 15% 
transmission cost in comparison with non-C.S. under worst 
case; thus, it outperforms non-aggregation in general. 
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(a) Correlation for Readings (b) Correlation for Increment 

Fig. 4. Correlation for Increment and Readings 



3) Data Correlation analysis: We make analysis against 
collected data and interval increment to reveal correlations. 

First, we compare readings over 128 appliances at current 
time slot with the previous one from round 2 and get Fig |4(a)| 
It represents strong correlations among the readings of every 
two time-slots, almost all correlations are larger than 99.95%. 
This observation demonstrates that the order of readings at 
one time-slot wouldn't change too much in the next. 

Then, we compare the readings at previous time slot and 
the increment between previous time slot and current one to 
get Fig |4(b)| We can observe that over 96.7% correlations are 
larger than 0.8, which means the strong correlations among 
the readings and the increments. Therefore, the increments 
and readings have the similar patterns, thus share the same 
sparse domain. 

4) Reconstruction Performance: Since our resorted read- 
ings are regarded as piecewise polynomial which can be 
represented with if-sparse under wavelet domain, and we have 
N equal 128 from data collection, here we choose 7-level Haar 
wavelet domain for sparsity transform. 

Here, we choose 7 rounds uniformly for evaluation. From 
Fig |3(a)| to Fig 3(g) we compare the reconstruction using our 
scheme with that choosing current readings' real order, also 
original readings are incorporated. First, we witness that the 



9 



Fig. 5. SNR between Original Order and Estimated Order 



difference between reconstructions with our scheme using last 
round's estimated order and original readings don't amplify 
after 396 rounds transmissions, it can still achieve recovery up 
to acceptable bounded errors; moreover, difference between 
recovery with original data order and our scheme are quite 
small. 

According to analysis from Section IIV-D1 when we use 
order of this round's estimated readings to reconstruct next 
ones, the errors incurred from choosing order with some 
inaccuracy would always be constrained within threshold of 
C/3a7a||?/||2- Fig |3(h)| describes the Li norm of estimation 
difference between measurements with estimated order and 
the original, for all 7 numerated rounds, our estimation errors 
are much less than the error bounds. The bound increases 
over rounds due to the increase of 1% norm of compressed 
measurements y; however, because of strong correlations in 
Fig |4(a)| data sparsity remained well in DWT domain over 
multiple rounds and contribute to reconstruction performance. 

To further quantify the difference, we use SNR to evaluate 
the performance of our scheme: original readings' power as 
PsignaU power of the difference between reconstructed value 



and original as P n oise> we nave SNR 



101g(^= 



and 



larger values represent better performance. From FigjS] for all 
396 transmission rounds, SNRs of our scheme's reconstruction 
are generally less than that of reconstruction using original 
readings' order due to incurred errors from choosing some 
inaccurate order; however, we can still achieve SNR larger 
than 20 all the time. 

Therefore, our scheme can achieve good performance even 
using last round's estimated order; the incurred errors wouldn't 
propagate and amplify across time since strong correlations ex- 
ist across readings, little perturbation of order matrix wouldn't 
degrade our performance. 

B. Security Cost 

Due to dependable transmission scheme in Section [V] the 
overall transmission cost would increase in packet size. The 
useful message length would increase 13 bytes from 6 bytes 
(4 for original readings, 2 for LHU ID) to 19 bytes (8 for en- 
crypted readings, 2 for LHU ID, 9 for signature with hash and 
time-stamp). In practice, when considering other information 
required for transmission such as MAC and PHY Headers 0, 
13 bytes would only occupy small part of overall transmitted 
data and wouldn't degrade our transmission performance. 



VII. Conclusion 

In this paper, we have proposed a new scheme for efficient 
and secure meter reading in Smart Grids based on compressed 
sensing. This is the first attempt to solve the problems of 
efficiency, security and individual data transmission simulta- 
neously. Our scheme works for collecting stream data, the 
incurred estimation errors wouldn't propagate over time. 

We observe strong temporal correlations among real me- 
ter readings which indicate their sparsity in certain domain. 
Building upon this observation, we propose the compressed 
reading scheme which can work over arbitrary tree topologies 
and reduce total transmission cost that is needed to collect and 
recover all the meter readings. In contrast to traditional CS- 
based transmission scheme that requires to know the sparse 
domain a prior, our scheme can recover stream data without 
the knowledge of sparse domain beforehand. We prove that 
the reconstruction error is bounded for every instance of the 
stream data and does not drift over time. Through numerical 
experiments, we observe that our scheme can reduce the 
transmission cost significantly as compared to a common 
benchmark [9| and the non-aggregation scheme. Due to the 
strong temporal correlations among the collected real-world 
data, we achieve good reconstruction performance. Mean- 
while, we tailor specific security scheme to ensure reliability 
and security of data transmission. Moreover, it incurs only 
minor extra overhead. 

VIII. Appendix 

A. Proof of Theorem 

Proof: Original readings d = ^x, where x is the sparse 
representation of d in domain Solution to (O is x. Since 
H^ 1 is blind during reconstruction in (0, we use ^x as 
recovered readings d. Since wavelet matrix ^ is orthonormal, 
\\ d - d\\h=\\y( x ~ x)\\i 2 = \\x - x\\ h 

From definitions in and Theorem 2 in |fl9l , we can 
directly get © to bound the reconstruction errors in our 
scheme with ^, $, 5k, y and "worst" Since the noise 

entry e equals 0, our perturbation only comes from matrix 

h-\ m 

B. Proof of Proposition [2] and \3\ 

Proof: Since y(t i+1 ) = $d(t i+ i) and y(U) = $d(*i), 
y{t i+ x)-y(ti) = $[d(ti +1 )-d(ti)] = $n(*i). Therefore, 
the estimation error ||<i(tj_|_i) — d(£i_|_i)||/ 2 = ||d(tj) + 
n(U)-d(ti)-n(ti)\\i 2 = \\n(ti)-n(ti)\\i 2 , i = 0, 1, ...,oo. 

Denote z(ti) and z(ti) are the representation of n(tj) 
and n(ti) in domain _ff~ 1 (ii)>I', respectively. ||n(ti) — 
n(U)\\u = WH-^t^ziU) - H- l {U)^z(ti)\\i 2 = 
WH-^iziU) - z(U))\\h = WHU) ~ z(U)\\ l2 . 
From the Theorem 1.1 in J28), \\z(U) - < 
CoK-^WzKiti) - Z {ti)\\ W . 

• If n(ti) is K-sparse in domain i/~ 1 (i;)<I' , then 
\\zk(U) — z {ti)\\h = 0. We can reconstruct d^ti+x) 
exactly. 
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(a) K Leafs-Transmission Topology (b) Merging two Nearest Branches 
Fig. 6. Maximum Transmission Cost Proof 



• If n(ti) is approximately if-sparse in domain H 1 (ij)^, 
\\d(t i+1 ) - d(t i+1 )\\ l2 < CoK-^WzkIU) - 
z{U)\\ h <C Q K-^e. 



D. Proof of Proposition 

Proof: Denote N is the number of nodes except data 
collector in the network; M is the number of packets trans- 
mitted for each aggregator; L is the layer of the given tree and 
TotalCost is the number of overall transmitted packets. 

Obviously, there are p l nodes in the i th layer, thus 
Y^j^oP' = Each node in the i th layer has Ylj=oP : ' — 1 
children nodes. Assume L — I th layer is the first layer where 
the nodes are aggregators, which means that Ylj=o P J ' ~~ 1 — ^ 
and X)j=o P J — 1 < M. To this end, the nodes in the j th layer, 
j = 1, 2, L — I are aggregators and others are forwarders. 
TotalCost of the scheme is the combination of two parts: cost 
of forwarders and cost of aggregators. 



C. Proof of Theorem 

Proof: Under non-C.S. scheme, number of transmitted 
packets along each LHU's outgoing link li ranges from 1 to 
M while always M when using CS strategy. According to this, 
overall transmitted packets ^ i=1 U can vary from N to N-M 
through respectively filling each link with 1 or M packets. 

Minimum cost N can be achieved through broadcasting 
where all N LHU directly transmits 1 packet towards collector. 
However, it's impossible to fill each link with M packets due 
to assumption of tree topology where at least one leaf-LHU 
exists and its outgoing link carries only one packet. 

Then discuss maximization case. Assume the cost can get 
maximized under topology with K leaf-LHUs in Fig |6(a)| 

First, we choose two neighboring LHUs i and j and find 
nearest common ancestor (NCE) Q as in Fig |6(b)[ thus, node 
Q connects only two branches i and j as downstream children. 
Now we append branch i to j's tail and consider cost change. 

Since property of LHU as aggregator or forwarder depends 
on its downstream nodes, property of Q and its upstream nodes 
remain unchanged after appending. Thus consider change in 
branch i and j (including L{ and Lf) while other transmission 
cost stays constant. Number of nodes in branch i and j are Ki 
and Kj respectively and we classify as following three cases. 

• Ki,Kj<M-\. After appending, transmission cost in- 
creases from (Kf +K] + K t + Kj)/2 to {{K. t + Kj) 2 + 
Ki + Kj)/2 

• Ki < M — 1, Kj > M. After appending, cost changes 
from KjM - M 2 /2 + M/2 + (Ki + 1)^/2 to KjM - 
M 2 /2 + M/2 + K. l M. Due to K, < M — 1, the cost also 
increases. Similar for Ki > M, Kj < M — 1 

• Ki> M, Kj > M. After appending, cost increases from 
{K. + K^M-iAP-M) to (K i +K j )M-(M 2 -M)/2. 

When merging two neighboring branches under NCE, 
number of leaf LHUs decrease by 1 and transmission cost 
decreases as well. Iteratively performing this operation, overall 
transmission cost keeps decreasing till one leaf LHU left. 

Maximum cost can be achieved under chain topology with 
only one leaf LHU and the cost is M(N - M/2 +1/2). 
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